Precision annotation of digital samples in NCBI’s gene expression omnibus
نویسندگان
چکیده
The Gene Expression Omnibus (GEO) contains more than two million digital samples from functional genomics experiments amassed over almost two decades. However, individual sample meta-data remains poorly described by unstructured free text attributes preventing its largescale reanalysis. We introduce the Search Tag Analyze Resource for GEO as a web application (http://STARGEO.org) to curate better annotations of sample phenotypes uniformly across different studies, and to use these sample annotations to define robust genomic signatures of disease pathology by meta-analysis. In this paper, we target a small group of biomedical graduate students to show rapid crowd-curation of precise sample annotations across all phenotypes, and we demonstrate the biological validity of these crowd-curated annotations for breast cancer. STARGEO.org makes GEO data findable, accessible, interoperable and reusable (i.e., FAIR) to ultimately facilitate knowledge discovery. Our work demonstrates the utility of crowd-curation and interpretation of open 'big data' under FAIR principles as a first step towards realizing an ideal paradigm of precision medicine.
منابع مشابه
Using the NCBO Web Services for Concept Recognition and Ontology Annotation of Expression Datasets
To provide enhanced access to expression datasets housed in the NCBI’s Gene Expression Omnibus database and to enable new opportunities for data mining we are using the NCBO’s Open Biomedical Annotator service to identify concepts and ontology terms in GEO records. Based on this first pass annotation we are curating these datasets using a variety of ontologies covering concepts of relevance to ...
متن کاملGeneration of transcript counts from pasilla dataset with kallisto
The pasilla dataset was produced by Brooks et al. [1]. The aim of their study was to identify exons that are regulated by pasilla protein, the Drosophila melanogaster ortholog of mammalian NOVA1 and NOVA2 (well studied splicing factors). In their RNA-seq experiment, the libraries were prepared from 7 biologically independent samples: 4 control samples and 3 samples in which pasilla was knocked-...
متن کاملStudy of Gene Expression Signatures for the Diagnosis of Pediatric Acute Lymphoblastic Leukemia (ALL) Through Gene Expression Array Analyses
Background: Acute lymphoblastic leukemia (ALL) as the most common malignancy in children is associated with high mortality and significant relapse. Currently, the non-invasive diagnosis of pediatric ALL is a main challenge in the early detection of patients. In the present study, a systems biology approach was used through network-based analysis to identify the key candidate genes related to AL...
متن کاملIdentification of Prognostic Genes in Her2-enriched Breast Cancer by Gene Co-Expression Net-work Analysis
Introduction: HER2-enriched subtype of breast cancer has a worse prognosis than luminal subtypes. Recently, the discovery of targeted therapies in other groups of breast cancer has increased patient survival. The aim of this study was to identify genes that affect the overall survival of this group of patients based on a systems biology approach. Methods: Gene expression data and clinical infor...
متن کاملComparative analysis of hepatocellular carcinoma and cirrhosis gene expression profiles
Gene expression data of hepatocellular carcinoma (HCC) was compared with that of cirrhosis (C) to identify critical genes in HCC. A total of five gene expression data sets were downloaded from Gene Expression Omnibus. HCC and healthy samples were combined as dataset HCC, whereas cirrhosis samples were included in dataset C. A network was constructed for dataset HCC with the package R for perfor...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 4 شماره
صفحات -
تاریخ انتشار 2017